WHAT IS CLAIMED IS: 



1 1 . A method comprising: 

2 adding a first plurality of data elements to a second plurality of data elements 

3 generating a plurality of intermediate results; 

4 adding two of the plurality of intermediate results and repeating with different 

5 combinations of the plurality of intermediate results generating a plurality 

6 of sum results; and 

7 discarding the two least significant bits of each sum result of the plurality of sum 

8 results. 

9 2. The method as recited in Claim 1, further comprising: 

1 0 performing a carry in of a value of one when performing the adding the first 

1 1 plurality of data elements to the second plurality of data elements. 

12 3. The method as recited in Claim 1, further comprising: 

13 performing a carry in of a rounding term when performing the adding the two of 

14 the plurality of intermediate results and the repeating. 

1 5 4. The method as recited in Claim 3, wherein the rounding term is a variable 

16 capable of having a value of one and, at a different time, a value of zero. 

17 5 . The method as recited in Claim 1 , further comprising: 

18 performing a carry in of a value of one when adding the first plurality of data 

19 elements to the second plurality of data elements; and 
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20 performing a carry in of a rounding term when adding the two of the plurality of 

21 intermediate results and when repeating. 

22 6. The method as recited in Claim 1 , wherein the first plurality of data elements 

23 and the second plurality of data elements each comprise eight eight-bit unsigned data 

24 elements. 

25 7. The method as recited in Claim 1 , wherein the first plurality of data elements 

26 and the second plurality of data elements each comprise eight sixteen-bit data elements. 

27 8. The method as recited in Claim 1, wherein the method comprises executing a 

28 Single-Instruction/Multiple-Data (SIMD) instruction. 

29 9. The method as recited in Claim 1 , wherein the method is performed utilizing 

30 Single-Instruction/Multiple-Data (SIMD) circuitry. 

31 10. A method comprising: 

32 adding an i t h data element of a first source to an i t h data element of a second 

33 source creating an i t h intermediate result for i = 1 to N, wherein N is an 

34 integer greater than 1 ; 

35 adding a j th intermediate result to a (j+l)th intermediate result creating a j t h sum 

36 result for j = 1 to (N-l); and 

37 discarding two least significant bits of each j t h sum result. 

38 11. The method as recited in Claim 10, further comprising: 
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39 performing a carry in of a value of one when adding the i th data element of the 

40 first source to the i t h data element of the second source; and 

41 performing a carry in of a rounding term when adding the j t h intermediate result to 

42 the (j+ l) t h intermediate result. 

43 12. The method as recited in Claim 11, wherein the rounding term is selected 

44 from a group consisting of a value of one and a value of zero. 

45 13. The method as recited in Claim 10, wherein N = 8. 

46 14. The method as recited in Claim 10, wherein the method is performed during 

47 execution of a Single-Instruction/Multiple-Data (SIMD) instruction. 

48 15. An apparatus comprising: 

49 a plurality of first adders, each first adder of the plurality of first adders operative 

50 to add two operands of a plurality of operands into one of a plurality of 

5 1 intermediate results; 

52 a plurality of second adders, each second adder of the plurality of second adders 

53 operative to add two intermediate results of the plurality of intermediate 

54 results into one of a plurality of sum results; and 

55 discard circuitry operative to discard the two least significant bits of each sum 

56 result of the plurality of sum results. 
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57 16. The apparatus as recited in Claim 15, wherein the plurality of first adders 

58 comprises eight first adders and the plurality of second adders comprises seven second 

59 adders. 

60 17. The apparatus as recited in Claim 15, wherein the discard circuitry comprises 

61 a plurality of shift registers. 

62 18. The apparatus as recited in Claim 15, wherein each of the first adders are 

63 operative to add two eight-bit input operands producing a nine-bit intermediate operand 

64 and each of the second adders are operative to add two nine-bit intermediate operands 

65 producing a ten-bit output operand. 

66 19. The apparatus as recited in Claim 15, wherein each of the first adders are 

67 operative to add two sixteen-bit input operands producing a seventeen-bit intermediate 

68 operand and each of the second adders are operative to add two seventeen-bit 

69 intermediate operands producing an eighteen-bit operand. 

70 20. The apparatus as recited in Claim 15, wherein routing of the plurality of 

71 operands and the plurality of intermediate results to the plurality of first adders and the 

72 plurality of second adders is selected according to microcode identified by a Single- 

73 Instruction/Multiple-Data (SIMD) instruction. 

74 21. The apparatus as recited in Claim 15, wherein routing of the plurality of 

75 operands and the plurality of intermediate results to the plurality of first adders and the 
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76 plurality of second adders is selected according to decode logic and a Single- 

77 Instruction/Multiple-Data (SIMD) instruction. 

78 22. The apparatus as recited in Claim 15, wherein the plurality of first adders, the 

79 plurality of second adders, and the discard circuitry form a Single-Instruction/Multiple- 

80 Data (SIMD) instruction execution circuit. 

81 23. An apparatus comprising: 

82 a plurality of first adders operative to add an i t h data element of a first source to an 

83 ith data element of a second source generating an i t h intermediate result for 

84 i = 1 to N, wherein N is an integer greater than 1 ; 

85 a plurality of second adders operative to add a j t h intermediate result to a (j+l)th 

86 intermediate result generating a j t h sum result for j = 1 to (N-l); and 

87 circuitry operative to discard two least significant bits of each j t h sum result. 

88 24. The apparatus as recited in Claim 23, wherein the circuitry comprises a 

89 plurality of shift registers. 

90 25. The apparatus as recited in Claim 23, wherein routing of the plurality of 

91 operands and the plurality of intermediate results to the plurality of first adders and the 

92 plurality of second adders is selected according to microcode identified by a Single- 

93 Instruction/Multiple-Data (SIMD)instruction. 

94 26. A method comprising: 

95 decoding an instruction identifying an averaging operation; 
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96 executing the instruction on a first source and a second source, wherein the first 

97 source comprises a first plurality of data elements and the second source 

98 comprises a second plurality of data elements; and 

99 storing a result, wherein the result comprises a third plurality of data elements; 

100 wherein the executing the instruction comprises: 

101 adding successive ones of the first plurality of data elements to successive 

102 ones of the second plurality of data elements generating a plurality 

1 03 of intermediate results; 

104 adding two of the plurality of intermediate results and repeating with 

1 05 different combinations of the plurality of intermediate results 

106 generating a plurality of sum; and 

107 discarding the two least significant bits of each sum result of the plurality 

108 of sum results generating the result. 

109 27. The method as recited in Claim 26, wherein the executing the instruction 

1 1 0 further comprises: 

1 1 1 performing a carry in of a value of one when adding the successive ones of the 

112 first plurality of data elements to the successive ones of the second 

1 1 3 plurality of data elements; and 

114 performing a carry in of a rounding term when adding the two of the plurality of 

1 1 5 intermediate results and when repeating. 

116 28. The method as recited in Claim 27, wherein the rounding term is selected 

1 1 7 from a group consisting of a value of one and a value of zero. 
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118 29. An apparatus comprising: 

119 a coprocessor interface unit to identify an instruction for an averaging operation, a 

120 first source having a first plurality of data elements and a second source 

121 having a second plurality of data elements; 

122 an execution unit to perform the averaging operation on the first plurality of data 

123 elements and the second plurality of data elements; and 

124 a register to store a result having a third plurality of data elements; 

125 wherein the execution unit is operative to: 

126 add successive ones of the first plurality of data elements to successive 

127 ones of the second plurality of data elements generating a plurality 

128 of intermediate results; 

129 add two of the plurality of intermediate results and repeating with different 

1 30 combinations of the plurality of intermediate results generating a 

1 3 1 plurality of sum results; and 

132 discard the two least significant bits of each sum result of the plurality of 

133 sum results forming the result. 

134 30. The apparatus as recited in Claim 29, wherein the execution unit is further 

135 operative to: 

136 perform a carry in of a value of one when adding the successive ones of the first 

137 plurality of data elements to the successive ones of the second plurality of 

138 data elements; and 

139 perform a carry in of a rounding term when adding the two of the plurality of 

140 intermediate results and when repeating. 
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141 31. The apparatus as recited in Claim 30, wherein the rounding term is selected 

142 from a group consisting of a value of one and a value of zero. 

143 32. A data processing system comprising: 

144 an addressable memory to store an instruction for an averaging operation; 

145 a processing core coupled to the addressable memory, the processor core 

146 comprising: 

147 an execution core to access the instruction; 

148 a first source register to store a first plurality of data elements; 

149 a second source register to store a second plurality of data elements; and 

1 50 a destination register to store a plurality of results of the averaging 

151 operation; 

152 a wireless interface to receive a digital signal comprising a third plurality of data 

153 elements; and 

154 an I/O system to provide the first and second plurality of data elements to the first 

155 and second source registers from the third plurality of data elements; 

156 wherein the execution core is operative to: 

157 add successive ones of the first plurality of data elements to successive 

158 ones of the second plurality of data elements generating a plurality 

159 of intermediate results; 

160 add two of the plurality of intermediate results and repeating with different 

161 combinations of the plurality of intermediate results generating a 

1 62 plurality of sum results; and 
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163 discard the two least significant bits of each sum result of the plurality of 

1 64 sum results generating the plurality of results. 

165 33. The data processing system as recited in Claim 32, wherein the execution core 

166 is further operative to: 

167 perform a carry in of a value of one when adding the successive ones of the first 

168 plurality of data elements to the successive ones of the second plurality of 

1 69 data elements; and 

170 perform a carry in of a rounding term when adding the two of the plurality of 

171 intermediate results and when repeating. 

172 34. The data processing system as recited in Claim 33, wherein the rounding term 

173 is a variable capable of having a value of one and, at a different time, a value of zero. 

174 35. An article comprising a machine-readable medium that includes machine 

175 readable instructions, the instructions operative to cause a machine to: 

176 add a first plurality of data elements to a second plurality of data elements 

1 77 generating a plurality of intermediate results; 

1 78 add two of the plurality of intermediate results and repeating with different 

1 79 combinations of the plurality of intermediate results generating a plurality 

1 80 of sum results; and 

181 discard the two least significant bits of each sum result of the plurality of sum 

1 82 results generating a result. 

183 36. The article as recited in Claim 35, the instructions further operative to: 
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1 84 perform a carry in of a value of one when adding the first plurality of data 

185 elements to the second plurality of data elements; and 

1 86 perform a carry in of a rounding term when adding the two of the plurality of 

1 87 intermediate results and when repeating. 
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